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Abstract —Many sensors, such as range, sonar, radar, GPS 
and visual devices, produce measurements which are contam¬ 
inated hy outliers. This problem can he addressed hy using 
fat-tailed sensor models, which account for the posslhility of 
outliers. Unfortunately, all estimation algorithms belonging to 
the family of Gaussian filters (such as the widely-used extended 
Kalman filter and unscented Kalman filter) are inherently in¬ 
compatible with such fat-tailed sensor models. The contribution 
of this paper is to show that any Gaussian filter can be made 
compatible with fat-tailed sensor models by applying one simple 
change: Instead of filtering with the physical measurement, 
we propose to filter with a pseudo measurement obtained by 
applying a feature function to the physical measurement. We 
derive such a feature function which is optimal under some 
conditions. Simulation results show that the proposed method 
can effectively handle measurement outliers and allows for 
robust filtering in both linear and nonlinear systems. 

I. Introduction 

Robust and accurate state estimation is essential to safely 
control any dynamical system. However, many sensors, such 
as range, sonar, radar, GPS and visual devices, provide mea¬ 
surements populated with outliers. Therefore, the estimation 
algorithm must not be unduly affected by such outliers. 

In this paper we argue that problems with outliers are a 
direct consequence of unrealistic, thin-tailed sensor models. 
Unfortunately, many widely-used estimation algorithms are 
inherently incompatible with more realistic, fat-tailed sensor 
models. This holds true for the extended Kalman filter (EKF) 
[1], the unscented Kalman filter (UKF) [2], and any other 
member of the family of Gaussian filters (GF) [3], as we 
will show in Section IV-A. 

The contribution of this paper is to show that any member 
of the family of GFs can be made compatible with fat¬ 
tailed sensor models by applying one simple change: In¬ 
stead of filtering with the physical measurement, we filter 
with a pseudo measurement. This pseudo measurement is 
obtained by applying a time-varying feature function to the 
physical measurement. We derive a feature function which is 
optimal under some conditions. In simulation experiments, 
we demonstrate the robustness and accuracy of the proposed 
method for linear as well as nonlinear systems. 

Numerous robustification methods have been proposed for 
individual members of the family of GFs, often involving 
significant algorithmic changes. In contrast, the proposed 
method can be applied to any GF with only minor changes 
in the implementation. Any existing GF implementation can 
be robustified by merely replacing the sensor model with a 
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pseudo sensor model, and the physical measurement with a 
pseudo measurement. 

II. Related Work 

Ad-hoc procedures for reducing the influence of outliers 
have been employed by engineers for a long time. One such 
heuristic is to simply discard all measurements which are 
too far away from the expected measurement. This approach 
lacks a Arm theoretical basis and there is no rigorous way of 
choosing the thresholds. Furthermore, the information con¬ 
tained in measurements outside of the thresholds is discarded 
completely, which can lead to decreased efficiency [4]. For 
these reasons, significant research effort has been devoted to 
robustifying GFs in a principled manner. In the following we 
distinguish two main currents on robust filtering, the first is 
based on robust statistics in the sense of [5] and the second 
is based on fat-tailed sensor models. 

A. Robust Statistics 

In the framework of robust statistics in the spirit of [5], 
the objective is to find an estimator with a small variance 
when the Gaussian noise is contaminated with noise from 
a broad class of distributions. The resulting estimators are 
intermediary between the sample mean and the sample 
median. For instance, Masreliez and Martin [6] propose such 
an estimator for linear systems. This approach is extended 
by Schick and Mitter [4]. 

B. Fat-tailed Sensor Models 

Since fat-tailed sensor models are by definition non- 
Gaussian, finding the posterior estimate is not trivial. In 
particular, a lot of effort has been devoted to finding filtering 
recursions for models with Student f-distributed noise. 

Roth et al. [7] show that for linear systems where the noise 
and the state are jointly f-distributed, an exact filter can be 
found. The authors mention that these noise conditions are 
rarely met in practice, and propose an approximation for 
state-independent (-distributed noise. A different approxima¬ 
tion scheme for linear systems with (-distributed noise is 
proposed in Meinhold and Singpurwalla [8]. 

While those approximations are hand-crafted. Ting et al. 
[9] and Sarkka and Nummenmaa [10] use variational in¬ 
ference techniques to find an optimal approximation to the 
posterior. Agamennoni et al. [11, 12] unify and generalize 
those methods. 

C. Extensions to Nonlinear Systems 

All methods mentioned above assume a linear sensor 
model. It is possible to apply them to nonlinear systems by 
linearizing the sensor model at each time step, as is done in 


the EKF. However, the EKF has been shown to yield poor 
performance for many nonlinear systems [13, 2, 14]. 

Application of these robustification methods to other mem¬ 
bers of the family of GEs, such as the UKE or the divided 
difference filter (DDE) [15], is not straightforward. 

One way of doing so is proposed by Karlgaard and Schaub 
[16], who use a robust Huber estimator [5] in a DDE. 
Similarly, Piche et al. [17] propose a method of extending 
the mentioned linear Student t-based filtering methods to 
nonlinear GEs. However, both of these methods rely on an 
iterative optimization at each time step, which is computa¬ 
tionally expensive. In contrast, the robustification proposed 
in this paper allows to robustify any of the numerous GE 
algorithms with just minor changes in the implementation. 

HI. Eiltering 

A discrete-time state-space model can be defined by 
two probability distributions: a transition model p{xt\xt-i), 
which describes the evolution of the state in time, and a sen¬ 
sor model p{yt\xt), which describes how the measurement 
yt is generated given the state Xt- Alternatively, these two 
models can also be written in functional form 

Xt = g{xt-i,vt) ( 1 ) 

yt = h{xt,wt) ( 2 ) 

with Vt and Wt being normally distributed noise variables. 
Note that any (even non-Gaussian) model can be specified 
in this way, since vt and wt can be mapped onto any desired 
distribution inside the nonlinear functions g{-) and h{-). 


A. Exact Filtering 

Eiltering is concerned with estimating the current state 
Xt given all past measurements yi-t = {yi,. ■. ,yt}- The 
posterior distribution of the current state p{xt\yi-t) can be 
computed recursively from the distribution of the previous 
state p{xt-i\yi-.t-i)- This recursion can be written in two 
steps: a prediction step’ 

p{xt\yi:t-i) = / p{xt\xt-i)p{xt-i\yi:t-i) (3) 

Jxt-l 


and an update step 

p{xt\yi-.t) 


p{yt\xt)p{xt\yi:t-i) 

p(ytlxt)p(xtlyi:t-i)' 


(4) 


These equations can generally not be solved in closed form 

[18] . The most notable exception is the Kalman filter (KE) 

[19] , which provides the exact solution for linear Gaussian 
systems. Significant research effort has been invested into 
generalizing the KE to nonlinear dynamical systems. 


B. Gaussian Filtering 

The KE and its generalizations to nonlinear systems (e.g. 
the EKE and the UKE) are members of the family of GEs 
[3, 20, 14, 21]. GEs approximate both the predicted belief (3), 
as well as the posterior belief (4) with Gaussian distributions. 


In the prediction step (3), the exact distribution is approx¬ 
imated by a Gaussian^ 

p{xt\yi:t — l) — ■N'ixt\y'Xti'^XtXt)- (5) 

The prediction step is not affected by the type of sensor 
model used and will therefore not be discussed here, see for 
instance [3, 20, 14, 21] for more details. 

We will only consider the update step (4) in the remainder 
of the paper. Eor ease of notation, we will not write the 
dependence on past measurements yi-t-i explicitly anymore. 
The remaining variables all have time index t, which can 
therefore be dropped. The predicted belief p{xt\yi-.t-i) can 
now simply be written as p{x), and the posterior belief 
p{xt\yi,t) as p{x\y), etc. 

As shown in [20], the GE can be understood as finding 
an approximate Gaussian posterior q{x\y) by minimizing the 
Kullback-Leibler divergence [22] to the exact joint distribu¬ 
tion 


argminKL[p(a;,y)|g(a:|y)]. (6) 

q 

The form of q{x\y) is restricted to be Gaussian in x 

q{x\y) = Af {x\m{y),E) (7) 


with the mean being an affine function of y 

m{y) = ^ Q • (8) 

This minimization is performed at each update step and 
yields the optimal parameters of the approximation (7) 

M = ~ '^xy^yy Py '^xy'^yy ) (9) 

S = '^xx ~ ^xy'^yy ^xy (10) 


See [20] for a detailed derivation of this result. The parame¬ 
ters px and Tixx are given by the belief (5) computed in the 
prediction step. The remaining parameters are defined as 

Pv= yp{y) (11) 

Jy 

'^yy = i {y - Py){y - PyYp{y) (12) 

^xy= (x-px){y-Pyypix,y). (13) 

Jx,y 

Eor a linear system, this solution corresponds to the KE 
equations [20]. 

Numeric Integration Methods: Eor most nonlinear sys¬ 
tems, the integrals (11), (12) and (13) cannot be computed in 
closed form and have to be approximated. In the EKE, this 
is done by linearization at the current mean estimate of the 
state Px- This approximation does not take the uncertainty 
in the estimate into account, which can lead to large errors 
and sometimes even divergence of the filter [13, 14]. 

Therefore, approximations based on numeric integration 
methods are preferable in most cases. Deterministic Gaussian 
integration schemes have been investigated thoroughly, and 
the resulting filters are collected under the term Sigma Point 


*We use the notation f^i') as an abbreviation for jyy(-) dx. 


^N{z\ti, S) denotes the Gaussian with mean fi and covariance S. 



Kalman Filters (SPKF) [13]. Well known members of this 
family are the UKF [2], the DDF [15] and the cubature 
Kalman filter (CKF) [23]. Alternatively, numeric integration 
can also be performed using Monte Carlo methods. The 
method presented in this paper applies to any GF, regardless 
of which particular integration method is used. 

IV. A Case for Fat Tails 

Measurement acquisition is typically modeled by a Gaus¬ 
sian or some other thin-tailed sensor model. This assumption 
is usually made for analytical convenience, not because it 
is an accurate representation of the belief of the engineer. 
If an engineer were to believe that measurements are in 
fact generated by a Gaussian distribution, then she would 
have to accept a betting ratio of 7 x 10^"^ to 1 that no 
measurement further than 8 standard deviations from the 
state will occur.^ Few engineers would be interested in such 
a bet, since one can usually not exclude the possibility of 
acquiring a large measurement due to unexpected physical 
effects in the measurement process. 

The mismatch between the actual belief and the Gaussian 
model can lead to counter-intuitive behavior of the inference 
algorithm. More concretely, the posterior mean is an affine 
function of the measurement. This implies that the shift in 
the mean produced by a single measurement is not bounded. 

This problematic behavior disappears when using a more 
realistic, fat-tailed model instead of the Gaussian model [8]. 
There are several dehnitions of fat-tails which are commonly 
used [25]. Here, we simply mean any distribution which 
decays slower than the Gaussian. Which particular tail model 
is used depends on the application. 


A. The Gaussian Filter using Fat Tails 

The GF approximates all beliefs with Gaussians, but the 
sensor model can have any form. In principle, nothing 
prevents us from using the GF with a fat-tailed sensor model. 
Unfortunately, the GF is not able to do proper inference 
using such a model. The sensor model p{y\x) enters the 
GF equations only through (11), (12) and (13). To make 
this dependency explicit, we substitute p{y) = j^p{y\x)p{x) 
in (11) and (12), and p{x,y) — p{y\x)p{x) in (13), and 
integrate in y 


Ty= Ty\xix)p{x) (14) 

J X 

'^yy “b (^) Ty (15) 

^xy — / {,X px'){yy\xix^ ( 16 ) 

J X 


What is important to note here is that these equations only 
depend on the sensor model through the conditional mean 
and the conditional covariance 



yp{y\x) 

{y - Ty\x{x)){y 


(17) 

Ty\xix)yp{y\x). (18) 



Fig. 1; Simulation of the system with fat-tailed measurement 
described in Example 4.1. 


X 



(a) The exact density p{xi\yi) (white means higher). Overlaid: 
contour lines of the approximate density q{xi\yi) given by a fat¬ 
tailed GF (green) and a thin-tailed GF (orange). 



(b) True state of the system over time, with the obtained estimates 
and their standard deviation. 

Fig. 2: Standard GF applied to the system with fat-tailed 
measurement described in Example 4.1. 


Since fat-tailed sensor models typically have very large 
or even infinite covariances, the GE will behave as if the 
measurements were extremely noisy. It achieves robustness 
by simply discarding all measurements, which is obviously 
not the behavior we were hoping for. 

B. Simulation Example 

To illustrate this problematic behavior, we apply the GE 
to the following dynamical system: 

Example 4.1: System specification^ 

p{xt\xt-i) = N{xt\xt-iy.Q) (19) 

p{yt\xt) = 0.9 N{yt\xt, 1.0) + 0.1 C{yt\xt, 10.0) (20) 
p(a;o) =Ab(a;o|0.0,1.0) (21) 

The measurements are contaminated with Cauchy- 
distributed noise, which leads to occasional outliers, as 


^According to De Finetti’s definition of probability [ 24 ]. 


^C(2|/i.,7) denotes the Cauchy distribution with location ^ and scale 7. 
















shown in Figure 1. We apply two GFs to this problem. The 
first uses a sensor model which does not take into account the 
fat-tailed Cauchy noise, it only models the Gaussian noise, 
i.e. the left term in (20). The second GF uses a sensor model 
which is identical to the true sensor (20). We will refer to 
the first hlter as the thin-tailed GF, and to the second filter 
as the fat-tailed GF. 

In Figure 2a, we show the exact density p{xi\yi) after the 
hrst hltering step. The approximations obtained by the thin¬ 
tailed GF (yellow) and the fat-tailed GF (green) are overlaid. 
It can be seen that the approximation to the exact posterior is 
very poor in both cases. The mean of the exact density p{x\y) 
is approximately linear in y for small y. For measurements 
y larger than about 5.0, the posterior mean reverts back to 
the prior mean 0.0 and does not depend on y anymore. 

This behavior cannot be captured by an approximation of 
the form of (8), since it only allows for linear dependences 
in y. The approximation by the thin-tailed GF fits the exact 
posterior well for small y, but instead of flattening out it 
keeps growing linearly for large y. Hence, it is not robust to 
outliers. The approximation by the fat-tailed GF correctly 
captures the behavior of the exact posterior for large y, 
i.e. it is independent of y. However, this implies that all 
measurements, not just outliers, are ignored, as expected 
from the analysis in Section IV-A. For both filters, the 
poor ht translates to poor filtering performance, as shown 
in Figure 2b. 

V. A Measurement Feature for Robustification 


where we have collected the terms independent of q{x\y) 
in C. Since there is no constraint on m{y), (24) can be 
optimized for each y independently. This means that the 
integral can be dropped, and we can simply minimize the 
integrand YilJ^p{x\y)\q{x\y)\ with respect to rn{y). It is a 
standard result from variational inference that the optimal 
parameters of a Gaussian approximation are obtained by 
moment matching [26]. That is, the optimal mean function 
m*{y) of the approximation is simply equal to the exact 
posterior mean 

w*(y) = y.x\y{y) = / xp{x\y). (25) 

J X 

Therefore, the feature vector ip{y) would ideally be chosen 
such that pLx\y{y) c™ be expressed through a linear combi¬ 
nation of features. Unfortunately, pix\y{y) cannot be found 
in closed form in most cases. 

The standard GF represents the mean of the posterior as an 
affine function of y. This form is optimal for linear Gaussian 
systems, and it serves as a good approximation for many 
nonlinear thin-tailed systems. Similarly, the idea here is to 
find the optimal feature for a linear Gaussian system with 
an additive fat tail. This feature can be expected to provide 
a good approximation for nonlinear fat-tailed systems. 

B. The Optimal Feature for a Linear, Fat-Tailed Sensor 
Suppose that we have a linear Gaussian sensor model 

b(^y\x) = Af (y\Axa, P) (26) 


To enable the GF to work with fat-tailed sensor models, 
we hence have to change the form of the approximate belief 
(7). In [20] it is shown that more flexible approximations 
can be obtained by allowing for nonlinear features in y. The 
mean function (8) then becomes 




( 22 ) 


The resulting hlter is equivalent to the standard GF using 
a virtual measurement which is obtained by applying a 
nonlinear feature function (p{-) to the physical measurement. 

In the following, we hnd a feature ipf) which enables the 
GF to work with fat-tailed sensor models. Instead of hand¬ 
designing such a feature, we attempt to hnd a feature which 
is optimal in the sense that it minimizes the KL divergence 
between the exact and the approximate distribution (6). 

For this purpose, we hrst hnd the optimal, non-parametric 
mean function m*(y) with respect to (6). Knowing that the 
mean m(y) is an affine function (22) of the feature ip(y), we 
can then deduce the optimal feature function <p*{y). 


A. The Optimal Mean Function 

In order to hnd the function m* (y) which minimizes (6), 
we rewrite the objective (6) 

KL[p{x,y)\q{x\y)] = J log p(a;,y) (23) 

= / ^[Pix\y)\qix\y)]p{y)C (24) 
Jy 


which we refer to as the body. We would like to add a fat 
tail f(-) to make the hlter robust to outliers. The combined 
sensor model with tail weight 0 < w < 1 is then 


p{y\x) = (1 - uj)h{y\x) + i^t{y\x). 


(27) 


1) Assumptions on the Form of the Tail: The precise shape 
of the tail is application specihc and does not matter for the 
ideas in this paper. However, the subsequent derivation relies 
on the assumption that the tail t{y\x) is almost constant in 
X on the length scale of the standard deviation of the belief 
p{x). This allows us to treat p{x) like a Dirac function with 
respect to t{y\x). More concretely, we will assume that the 
approximation 

[ fix)tiy\x)p{x) :iit{y\px) [ fix)p(x), (28) 

J X J X 

is accurate for any affine function f(x). 

This is a reasonable assumption, since the tail accounts 
for unexpected effects in the measurement process, which 
by dehnition bear little or no relation to the state x. For 
instance, Thrun et al. [27] suggest to use a tail which is 
independent of the state x and uniform in y, to account for 
outliers in range sensors. For such uniform tails, (28) is exact. 
For state-dependent tails, we expect this approximation to be 
accurate enough to provide insights into the required form 
of the feature. 

2 ) The Conditional Mean: We will now hnd the posterior 
mean p,x\y{y) for this measurement model, which will then 
allow us to hnd the optimal feature. The posterior mean can 
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Fig. 3: The approximate optimal mean function (32) for the 
dynamical system from Example 4.1 (at time t = 1). 



Fig. 4: The three components of the optimal feature (36) for 
the dynamical system from Example 4.1 (at time t = 1). 


be obtained from the predicted belief p{x) and the sensor 
model p{y\x) using Bayes’ rule 





xp{x\y) 


L xp(ylx)p(x) 
f^p(ylx)p(x) 


(29) 


Inserting (27) we obtain 


^ (1 - L xb(ylx)p(x) +ujf^ xt(ylx)p(x) 

^ L Hylx)p(x) + UJ t(ylx)p(x) 

(30) 


Both the predicted belief p(x) = N{x\pLxj^xx) and the 
body of the sensor model h{y\x) are Gaussian. Therefore, the 
integrals in the first term of the numerator and the first term 
in the denominator can be solved analytically using standard 
Gaussian marginalization and conditioning. The integrals in 
the second terms of the numerator and the denominator can 
be approximated according to (28), and we obtain 

Px\y(y) ~ i^x\y(y) (31) 

_ (1 - cu)(rf + Dy)M{y\pl, + ujpJ{y\p^) 

(1 - uj)N(y\pl, TPyy) + ujt{y\p^) 

where we have defined 


D = P-^ A)-^ P-^ (33) 

d = (E-,1 + - A^P-^a). (34) 


The expectations in (32) 


Fy = / / yb{y\x)p{x) 

JX Jy 

Ky=[ [(y~ ~ ylyKy\x)p{x) 

J X J y 


(35) 


only involve the body, and not the tail distribution. Hence, 
we avoid the problems related to the potentially huge or even 
infinite covariance of the tail discussed in Section IV-A. 


In Figure 3, we plot the optimal mean function (32) for 
dynamical system in Example 4.1 (at time t = 1). For y close 
to the expected measurement py = 0, the conditional mean 
(32) is approximately linear in y. If a measurement y of large 
magnitude is obtained, then the tail becomes predominant, 
and the posterior mean reverts to the prior mean = 0. 

The standard GF attempts to approximate this function by 
an affine function (8). Not surprisingly, this yields very poor 
results, as shown in Figure 2a. 


3) The Optimal Feature: To identify the feature required 
to express the optimal mean m*{y) = Px\y{y) ~ Fxlyiy)^ we 
compare (32) to (22). All the constant terms can be collected 
in M = (O d D p^) and all the terms which depend on 
y are part of the feature’’ 



f {l-Lo)A(y\pl,Yly)\ 
y{l - ui)Af{y\pl,Yly) 
ujt{y\px) y 


p(y) _ 

^Wiy\pl, Y^y) + u:t{y\px)' 


In Figure 4, we plot the three dimension of (36) for the 
Example 4.1. All of the feature components are asymp¬ 
totically constant in y, which means that the estimate 
remains bounded for arbitrarily large measurements. The 
three components have intuitive interpretations. The first 
two components are approximately constant and linear in y 
respectively, for measurements close to the expected value. 
Hence, they allow the filter to express an affine dependence 
on y which will vanish for very large measurements. The 
third component is small for y close to the expected value, 
and grows up to some constant for y which are large. It 
hence allows the mean estimate to revert to a constant value 
for large measurements. 

For the special case of w = 0, the feature becomes 

piy) = y,y,oy- (37) 

Thus, if the sensor model does not have a fat tail, the standard 
Gaussian Filter is retrieved. The linear mean function (8) is 
a special case of the feature mean function (22). 

VI. The Robust Gaussian Filter 

In the previous section, we found the approximately opti¬ 
mal measurement feature for a linear Gaussian sensor model 
with additive fat tails. The GF can hence be enabled to work 
with fat-tailed sensor models by filtering in feature space. 
This robustification can be applied to any member of the 
family of GFs, be it the EKF or an SPKF We will refer 
to the hlter obtained by using the feature (36) as the robust 
Gaussian hlter (RGF). 

For nonlinear, fat-tailed models, the RGF will not be 
optimal, but it provides a good approximation in the same 
way the standard GF provides a good approximation to 

^ The factors (1—a;) and ui in the numerator could equally well have been 
collected in M instead of the feature, since they are constant. However, we 
prefer to maintain these terms in the feature since they provide appropriate 
scaling. 




















Algorithm 1 Gaussian Filter 

Input: p{xt-i\yi:t-i),yt, 9 {-), h{-) 

Output: p{xt\yi-.t) 

1: Pixt\yi:t-i) = predict[p{xt-i\yi-.t-i), g{-)] 

2 : p{xt\yi:t) = update[p{xt\yi:t-i),h{-),yt] 

3: Return p(a;t|yi:t) 

Algorithm 2 Robust Gaussian Filter 

Input: p{xt-i\yi:t-i 
Output: p{xt\yi-.t) 

1: p{xt\yi:t-i) = predict[p{xt-i\yi-.t-i), gi-)] 

2: = P^edict[p{xt\yv.t-i),h^i-)] 

3: pti-) = featurelp^^, > as in (36), 

given from Step 1 and from Step 2. 

4: p{xt\yi:t) = update[p{xt\yi:t-i),pt{h{-)),pt{yt)] 

5: Return p(xt It/i;t) 


nonlinear, thin-tailed sensor models. If the RGF is applied 
to a sensor model without a fat tail, it will coincide with the 
standard GF, since the feature reduces to a linear function 
(37). Hence, the RGF extends the GF. It broadens its domain 
of applicability to fat-tailed sensor models. 

Algorithm: For clarity, we describe the RGF algorithm 
here step by step. Since this involves variables of several 
time steps, we will reintroduce the time indices which we 
dropped earlier. 

The standard GF is described in Algorithm 1 . The input to 
the algorithm are the previous belief, the new measurement 
yt, the transition model (1) and the sensor model (2). The 
GF simply predicts, then updates, and finally returns the new 
estimate. The concrete implementation of the predict and 
the update functions depends on whether we are using an 
EKF, a UKF, a DDF or some other GF. 

The RGF is described in Algorithm 2. It requires the same 
inputs as the GF, and additionally the separate components 
of the sensor model; body, tail, and tail weight. In particular, 
the functional form of the body is used in Step 2, while 
the feature computation in Step 3 requires the tail weight oj 
and the evaluation of the tail’s distribution t{-). 

The RGF delegates all the main computations to the basic 
GF through the predict and the update functions. The 
overhead in the implementation and in the computational cost 
is minor. Hence, the proposed method makes it straightfor¬ 
ward to robustify any existing GF algorithm. 

VH. Simulation Experiments 

In this section, we evaluate the RGE through simulations. 
Eirst, we show that the optimal feature enables a good fit of 
the approximate belief to the exact posterior in the linear 
system used in previous sections. Secondly, we evaluate 
the sensitivity of the RGE to the choice of tail parameters 
(Section VII-B). Einally, we show that the proposed feature 
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(a) The contour lines of the approximate density q{xi\yi) overlaid 
on the exact density p{xi\yi). 



(b) True state and filter estimates over time. 

Eig. 5: RGE applied to the system with fat-tailed mea¬ 
surement described in Example 4.1, to be compared to the 
standard GE in Eigure 2. 

(36), which we designed for a linear system, also allows for 
robustification in nonlinear systems (Section VII-C). 

We implemented Algorithm 2 using Monte Carlo as 
method for the numeric integration required by the 
predict and the update functions®. 

A. Application to a Linear Filtering Problem 

We revisit the simulation in Example 4.1 applying this 
time the RGE, using the true transition and sensor models. 
Comparing Eigure 2a to Eigure 5a, it is clear that the feature 
(36) allows for a much better fit of the approximation to 
the true density. As expected, this improved fit translates to 
a better filtering performance (Eigure 5b). As desired, the 
proposed method is sensitive to measurements close to the 
expected values, but does not react to extreme values. 

B. Robustness to Tail Parameters 

To show that the RGE is not sensitive to the specific choice 
of the tail parameters, we simulate the same system as above, 
and run several RGEs with different tail parameters. Eirst, 
we apply a RGE using a sensor model matching the true 
sensor, i.e. with tail parameters ui = 0.1, 7 = 10. Then, we 
apply two RGEs which use incorrect tail parameters. In one 
case we make both the weight and scale of the tail much 
lower than in the true distribution: w = 0 . 001 , 7 = 1.0 
(underestimation of the true tail). In the other case we make 
them much higher: uj = 0.5, 7 = 100.0 (overestimation). 
Eigure 6 shows almost no degradation in the performance. 
The key aspect enabling good filtering performance is that 

^Code is available at https ; //git-amd . tuebingen . mpg . de/ 
amd-clmc/python_gaussian_flitering 












Fig. 6: Robustness of the RGF to the choice of tail param¬ 
eters. The RGF behaves very similarly even when the tail 
parameters are severely under- or overestimated. 
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0.05 

s 
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km 

(Jy 

CO 
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O 
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15.8 

km 

/^o 

0.59 
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0.63 

mrad 

Ho 

13.4 

km 

^con,6 

200 

mrad 

Gmo 

3.986 • 10® 

km®/s^ 

a 

0.15 


Ro 

6374 

km 





TABLE I: Simulation parameters. 


the sensor model has a tail which decays slower than the 
Gaussian distribution, even when the shape of the true tail 
is not precisely known. 

C. Application to a Nonlinear Filtering Problem 

As an example of nonlinear filtering, we consider the 
problem of using measurements from a radar ground station 
to track the position of a vehicle that enters the atmosphere 
at high altitude and speed. The measurements provided by 
the radar are range and bearing angle to the target vehicle. 
This type of problem has been used before to compare the 
capability of filters to deal with strong nonlinearities, e.g. 
[28, 16]. 

The noise in radar systems is typically referred to as glint 
noise in the literature, and is known to be contaminated 
with outliers [29, 30, 16, 31, 32]. It has been modeled in 
different ways, e.g. using a Student t distribution, or as a 
mixture of two zero-mean Gaussian distributions (one with 
high weight and low variance and another with low weight 
and high variance), see [31] and references therein. In this 
section, we simulate the same system as in [28], but replace 
their Gaussian measurement noise with a mixture of two 
Gaussians as in [16, 31]. 

State Transition Process: The state consists of the position 
of the vehicle its velocity xW), and an 

unknown aerodynamics parameter which has to be 
estimated. The state dynamics are 

= x^ -f Axf' (38) 

x[+i = xf' -f Ax[^' (39) 

x[^i = xf' -I- A(i9txp^ -f Gtx\^^) + v/A tT„ vf^ (40) 

x[+i = xf' + /S{Dtxf^ + Gtxf^) + sfK ay vf' (41) 

^[ 5 ] _ ^,[ 5 ] (Aa\ 


where and follow a standard normal distribution, 
and A is the discretization time step. The drag and gravity 
coefficients, Dt = —exp ^ 

depend on the distance of the object to the c entre of the Earth 
Rt = its speed Vt = \J + (x[‘*^)2, 

and its unknown ballistic coefficient Pt = Po exp(x|^^). Other 
quantities such as the nominal ballistic coefficient Pq and the 
mass mo and and radius Rq of the Earth are constant, see 
Table I. 

Sensor Model: The radar is located at {xr,yr) and mea¬ 
sures range and bearing angle 9t to the target object 

n = \J (x[^' - XrY + {x'f - yrY + (43) 

6t = 10^ arctan [ ^ J + (44) 

\x[ -XrJ 

tu ~ (1 - a)Af{w\Q, E„om) + aA/‘(w| 0 , Scon)- (45) 

The nominal noise covariance is represented by Snom = 

Scon = diag([crc™ ^]) is the 

covariance of the contaminating noise component. We use a 
and covariances similar to [31], see Table I. 

Filter Specification: We compare an RGE with two GEs. 
The three filters use transition models that coincide with the 
real process (38)-(42). 

In problems of this type, the contaminating noise is often 
not precisely known. Therefore, we make our RGE assume 
a measurement model as in Example 4.1 

Prgf{w) = 0.9 A/’(tu|0, ^nom) 

-f 0.1 C(u;|0,diag([(10(Tnom,r)^, (10cr„om,e)^])), (46) 

which makes use of some knowledge of the nominal noise 
Snom, while the shape of the tail and the mixing weight take 
default values. Similarly, the first GE only knows about Snom 

PGFthin(w) = A/'(w|0, ^nom) ■ (47) 

As discussed in Section IV-A, the GE is not able to 
produce accurate estimates in systems with large variance 
even if the true measurement process (45) is known. To show 
this empirically, we apply a second GE which uses the true 
covariance of the sensor (45) 

PGFfat(w) = A/'(ui|0, (1 - a)Snom + ClScon)- (48) 

We simulate the system during 100 s, using the inte¬ 
gration time step A for the predictions and taking radar 
measurements at 1 Hz. As in [28], the initial state of the 
system is xq = [6500.4,349.14,-1.8093,-6.7967,0.6932], 
and the initial belief for all filters is centered at p,Q = 
[6500.4,349.14,-1.8093,-6.7967,0]. Note the mismatch 
between the true ballistic coefficient and the initial belief, 
i.e. the nominal Pq. 

Results: Eigures 7a and 7b respectively show the error in 
the estimate of xl^l and the corresponding velocity x^^l. We 
do not include the error in the position and velocity along 
the other dimension, since they are qualitatively similar. We 
can see that the GE using the nominal variance (yellow) 
reacts strongly to outliers. The GE using the true variance 












(a) Error between the estimated and the 
true state (position component). 


(b) Error between estimated and true state 
(velocity component). 


(c) Euclidean distance between estimated 
and true 2D position. 


Fig. 7; Results on the nonlinear filtering problem. The RGF deals well with the nonlinearities and the fat-tailed measurements. 


(green) of the sensor does not react as strongly. However, 
due to the large variance, it tracks the true state poorly. In 
contrast, the RGF (red) is robust to outliers and at the same 
time tracks the true state well. This translates to a low 2D 
location error as shown in Figure 7c. These results indicate 
that the optimal feature for linear systems allows to robustify 
nonlinear systems too. 

VIII. Conclusion 

In the standard GF algorithm, the mean estimate is an 
affine function of the measurement. We showed that for fat¬ 
tailed sensor models this provides a very poor approximation 
to the exact posterior mean. 

A recent result [20] showed that filtering in measurement 
feature space can allow for more accurate approximations of 
the exact posterior. Here, we have found the feature that is 
optimal for fat-tailed sensor models under certain conditions. 

We have shown both theoretically and in simulation that 
applying the standard GF in this feature space enables it to 
work well with fat-tailed sensor models. The proposed RGF 
is hence robust to outliers while maintaining the computa¬ 
tional efficiency of the standard GF. Any member of the 
family of GFs, such as the EKF or the UKF, can thus be 
robustified by the proposed method without changing any of 
the main computations. 

We have applied this algorithm to the problem of 3D object 
tracking using an Xtion range sensor [33]. The main source 
of outliers in this application are occlusions of the tracked 
object. While the standard GF immediately loses track of 
the object when occlusions occur, the RGF works well even 
under heavy occlusion. 
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